An Empirical Study of Differences between Conversion Schemes and Annotation Guidelines

نویسنده

  • Anders Søgaard
چکیده

We establish quantitative methods for comparing and estimating the quality of dependency annotations or conversion schemes. We use generalized tree-edit distance to measure divergence between annotations and propose theoretical learnability, derivational perplexity and downstream performance for evaluation. We present systematic experiments with treeto-dependency conversions of the PennIII treebank, as well as observations from experiments using treebanks from multiple languages. Our most important observations are: (a) parser bias makes most parsers insensitive to non-local differences between annotations, but (b) choice of annotation nevertheless has significant impact on most downstream applications, and (c) while learnability does not correlate with downstream performance, learnable annotations will lead to more robust performance across domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Towards Feasible Guidelines for the Annotation of Argument Schemes

The annotation of argument schemes represents an important step for argumentation mining. General guidelines for the annotation of argument schemes, applicable to any topic, are still missing due to the lack of a suitable taxonomy in Argumentation Theory and the need for highly trained expert annotators. We present a set of guidelines for the annotation of argument schemes, taking as a framewor...

متن کامل

Identifying Argumentation Schemes in Genetics Research Articles

This paper presents preliminary work on identification of argumentation schemes, i.e., identifying premises, conclusion and name of argumentation scheme, in arguments for scientific claims in genetics research articles. The goal is to develop annotation guidelines for creating corpora for argumentation mining research. This paper gives the specification of ten semantically distinct argumentatio...

متن کامل

Gender and Crime: An Empirical Test of General Strain Theory among Youth in Babol (A City in Northern Part of Iran)

This  paper  presents  an attempt  to use Agnew’s General Strain Theory ( GST) (1992) for explanation of the  criminal behavior  differences between  young males and females in Babol, a city in northern part of Iran. General Strain Theory (GST) is essentially regarded as a set of ideas formulated to explain the occurrence of crime as a result of the strain in social life. This study explores th...

متن کامل

Challenges in Converting between Treebanks: a Case Study from the HUTB

An important question for treebank development is whether high-quality conversion from one representation (e.g., dependency structure) to another representation (e.g., phrase structure) is possible, assuming that annotation guidelines exist for both representations. In this study, we demonstrate that the conversion is possible only under certain conditions, and even when the conditions are met,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013